The Dataset is from https://wwww.kaggle.com The file size is about 547MB
Load the dataset into a data frame using Pandas Explore the number of rows & columns, ranges of values etc. Handle missing, incorrect and invalid data Perform any additional steps (parsing dates, creating additional columns, merging multiple dataset etc.)
Compute the mean, sum, range and other interesting statistics for numeric columns Explore relationship between columns using scatter plots, bar charts etc. Make a note of interesting insights from the exploratory analysis Your notebook should contain at least 8 graphs & 4 different types of graphs
Ask at least 8 interesting questions about your dataset Answer the questions either by computing the results using Numpy/Pandas or by plotting graphs using Matplotlib/Seaborn/Plotly/Folium Create new columns, merge multiple dataset and perform grouping/aggregation wherever necessary For each question, summarize the key insight from the analysis or visualization in simple words
Write a summary of what you've learned from the analysis Include interesting insights and graphs from previous sections Share ideas for future work on the same topic using other relevant datasets Share links to resources you found useful during your analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df=pd.read_csv('zomato.csv')
df
| url | address | name | online_order | book_table | rate | votes | phone | location | rest_type | dish_liked | cuisines | approx_cost(for two people) | reviews_list | menu_item | listed_in(type) | listed_in(city) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | https://www.zomato.com/bangalore/jalsa-banasha... | 942, 21st Main Road, 2nd Stage, Banashankari, ... | Jalsa | Yes | Yes | 4.1/5 | 775 | 080 42297555\r\n+91 9743772233 | Banashankari | Casual Dining | Pasta, Lunch Buffet, Masala Papad, Paneer Laja... | North Indian, Mughlai, Chinese | 800 | [('Rated 4.0', 'RATED\n A beautiful place to ... | [] | Buffet | Banashankari |
| 1 | https://www.zomato.com/bangalore/spice-elephan... | 2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ... | Spice Elephant | Yes | No | 4.1/5 | 787 | 080 41714161 | Banashankari | Casual Dining | Momos, Lunch Buffet, Chocolate Nirvana, Thai G... | Chinese, North Indian, Thai | 800 | [('Rated 4.0', 'RATED\n Had been here for din... | [] | Buffet | Banashankari |
| 2 | https://www.zomato.com/SanchurroBangalore?cont... | 1112, Next to KIMS Medical College, 17th Cross... | San Churro Cafe | Yes | No | 3.8/5 | 918 | +91 9663487993 | Banashankari | Cafe, Casual Dining | Churros, Cannelloni, Minestrone Soup, Hot Choc... | Cafe, Mexican, Italian | 800 | [('Rated 3.0', "RATED\n Ambience is not that ... | [] | Buffet | Banashankari |
| 3 | https://www.zomato.com/bangalore/addhuri-udupi... | 1st Floor, Annakuteera, 3rd Stage, Banashankar... | Addhuri Udupi Bhojana | No | No | 3.7/5 | 88 | +91 9620009302 | Banashankari | Quick Bites | Masala Dosa | South Indian, North Indian | 300 | [('Rated 4.0', "RATED\n Great food and proper... | [] | Buffet | Banashankari |
| 4 | https://www.zomato.com/bangalore/grand-village... | 10, 3rd Floor, Lakshmi Associates, Gandhi Baza... | Grand Village | No | No | 3.8/5 | 166 | +91 8026612447\r\n+91 9901210005 | Basavanagudi | Casual Dining | Panipuri, Gol Gappe | North Indian, Rajasthani | 600 | [('Rated 4.0', 'RATED\n Very good restaurant ... | [] | Buffet | Banashankari |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 51712 | https://www.zomato.com/bangalore/best-brews-fo... | Four Points by Sheraton Bengaluru, 43/3, White... | Best Brews - Four Points by Sheraton Bengaluru... | No | No | 3.6 /5 | 27 | 080 40301477 | Whitefield | Bar | NaN | Continental | 1,500 | [('Rated 5.0', "RATED\n Food and service are ... | [] | Pubs and bars | Whitefield |
| 51713 | https://www.zomato.com/bangalore/vinod-bar-and... | Number 10, Garudachar Palya, Mahadevapura, Whi... | Vinod Bar And Restaurant | No | No | NaN | 0 | +91 8197675843 | Whitefield | Bar | NaN | Finger Food | 600 | [] | [] | Pubs and bars | Whitefield |
| 51714 | https://www.zomato.com/bangalore/plunge-sherat... | Sheraton Grand Bengaluru Whitefield Hotel & Co... | Plunge - Sheraton Grand Bengaluru Whitefield H... | No | No | NaN | 0 | NaN | Whitefield | Bar | NaN | Finger Food | 2,000 | [] | [] | Pubs and bars | Whitefield |
| 51715 | https://www.zomato.com/bangalore/chime-sherato... | Sheraton Grand Bengaluru Whitefield Hotel & Co... | Chime - Sheraton Grand Bengaluru Whitefield Ho... | No | Yes | 4.3 /5 | 236 | 080 49652769 | ITPL Main Road, Whitefield | Bar | Cocktails, Pizza, Buttermilk | Finger Food | 2,500 | [('Rated 4.0', 'RATED\n Nice and friendly pla... | [] | Pubs and bars | Whitefield |
| 51716 | https://www.zomato.com/bangalore/the-nest-the-... | ITPL Main Road, KIADB Export Promotion Industr... | The Nest - The Den Bengaluru | No | No | 3.4 /5 | 13 | +91 8071117272 | ITPL Main Road, Whitefield | Bar, Casual Dining | NaN | Finger Food, North Indian, Continental | 1,500 | [('Rated 5.0', 'RATED\n Great ambience , look... | [] | Pubs and bars | Whitefield |
51717 rows × 17 columns
print('The DataFrame Contains about {} rows and columns respectively'.format(df.shape))
The DataFrame Contains about (51717, 17) rows and columns respectively
print('The DataFrame contains the folowing columns {} '.format(df.columns))
The DataFrame contains the folowing columns Index(['url', 'address', 'name', 'online_order', 'book_table', 'rate', 'votes',
'phone', 'location', 'rest_type', 'dish_liked', 'cuisines',
'approx_cost(for two people)', 'reviews_list', 'menu_item',
'listed_in(type)', 'listed_in(city)'],
dtype='object')
print('Statistically, the following can be described for the DataFrame {} '.format(df.describe()))
Statistically, the following can be described for the DataFrame votes count 51717.000000 mean 283.697527 std 803.838853 min 0.000000 25% 7.000000 50% 41.000000 75% 198.000000 max 16832.000000
print('In total, the DataFrame has the null values of columns {}'.format(df.isnull().sum()))
In total, the DataFrame has the null values of columns url 0 address 0 name 0 online_order 0 book_table 0 rate 7775 votes 0 phone 1208 location 21 rest_type 227 dish_liked 28078 cuisines 45 approx_cost(for two people) 346 reviews_list 0 menu_item 0 listed_in(type) 0 listed_in(city) 0 dtype: int64
The URL column, address, Menu Column, Phone Number are not requrired. So I am going to drop those columns. Also in the other coulmuns, I will replace the null values with appropriate values. The rating is done out of 5 So I am going to keep only the rating and take out the out of 5 value. from the column.
df=df.drop(['url'],axis=1)
df=df.drop(['address'],axis=1)
df=df.drop(['menu_item'],axis=1)
df=df.drop(['phone'],axis=1)
df=df.drop(['reviews_list'],axis=1)
df
| name | online_order | book_table | rate | votes | location | rest_type | dish_liked | cuisines | approx_cost(for two people) | listed_in(type) | listed_in(city) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Jalsa | Yes | Yes | 4.1/5 | 775 | Banashankari | Casual Dining | Pasta, Lunch Buffet, Masala Papad, Paneer Laja... | North Indian, Mughlai, Chinese | 800 | Buffet | Banashankari |
| 1 | Spice Elephant | Yes | No | 4.1/5 | 787 | Banashankari | Casual Dining | Momos, Lunch Buffet, Chocolate Nirvana, Thai G... | Chinese, North Indian, Thai | 800 | Buffet | Banashankari |
| 2 | San Churro Cafe | Yes | No | 3.8/5 | 918 | Banashankari | Cafe, Casual Dining | Churros, Cannelloni, Minestrone Soup, Hot Choc... | Cafe, Mexican, Italian | 800 | Buffet | Banashankari |
| 3 | Addhuri Udupi Bhojana | No | No | 3.7/5 | 88 | Banashankari | Quick Bites | Masala Dosa | South Indian, North Indian | 300 | Buffet | Banashankari |
| 4 | Grand Village | No | No | 3.8/5 | 166 | Basavanagudi | Casual Dining | Panipuri, Gol Gappe | North Indian, Rajasthani | 600 | Buffet | Banashankari |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 51712 | Best Brews - Four Points by Sheraton Bengaluru... | No | No | 3.6 /5 | 27 | Whitefield | Bar | NaN | Continental | 1,500 | Pubs and bars | Whitefield |
| 51713 | Vinod Bar And Restaurant | No | No | NaN | 0 | Whitefield | Bar | NaN | Finger Food | 600 | Pubs and bars | Whitefield |
| 51714 | Plunge - Sheraton Grand Bengaluru Whitefield H... | No | No | NaN | 0 | Whitefield | Bar | NaN | Finger Food | 2,000 | Pubs and bars | Whitefield |
| 51715 | Chime - Sheraton Grand Bengaluru Whitefield Ho... | No | Yes | 4.3 /5 | 236 | ITPL Main Road, Whitefield | Bar | Cocktails, Pizza, Buttermilk | Finger Food | 2,500 | Pubs and bars | Whitefield |
| 51716 | The Nest - The Den Bengaluru | No | No | 3.4 /5 | 13 | ITPL Main Road, Whitefield | Bar, Casual Dining | NaN | Finger Food, North Indian, Continental | 1,500 | Pubs and bars | Whitefield |
51717 rows × 12 columns
df['rate'].unique()
array(['4.1/5', '3.8/5', '3.7/5', '3.6/5', '4.6/5', '4.0/5', '4.2/5',
'3.9/5', '3.1/5', '3.0/5', '3.2/5', '3.3/5', '2.8/5', '4.4/5',
'4.3/5', 'NEW', '2.9/5', '3.5/5', nan, '2.6/5', '3.8 /5', '3.4/5',
'4.5/5', '2.5/5', '2.7/5', '4.7/5', '2.4/5', '2.2/5', '2.3/5',
'3.4 /5', '-', '3.6 /5', '4.8/5', '3.9 /5', '4.2 /5', '4.0 /5',
'4.1 /5', '3.7 /5', '3.1 /5', '2.9 /5', '3.3 /5', '2.8 /5',
'3.5 /5', '2.7 /5', '2.5 /5', '3.2 /5', '2.6 /5', '4.5 /5',
'4.3 /5', '4.4 /5', '4.9/5', '2.1/5', '2.0/5', '1.8/5', '4.6 /5',
'4.9 /5', '3.0 /5', '4.8 /5', '2.3 /5', '4.7 /5', '2.4 /5',
'2.1 /5', '2.2 /5', '2.0 /5', '1.8 /5'], dtype=object)
def convert_rate(value):
if(value=='NEW' or value=='-'):
return np.nan
else:
value=str(value).split('/')
value=value[0]
return float(value)
df['rate']=df['rate'].apply(convert_rate)
df.fillna(0, inplace=True)
df
| name | online_order | book_table | rate | votes | location | rest_type | dish_liked | cuisines | approx_cost(for two people) | listed_in(type) | listed_in(city) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Jalsa | Yes | Yes | 4.1 | 775 | Banashankari | Casual Dining | Pasta, Lunch Buffet, Masala Papad, Paneer Laja... | North Indian, Mughlai, Chinese | 800 | Buffet | Banashankari |
| 1 | Spice Elephant | Yes | No | 4.1 | 787 | Banashankari | Casual Dining | Momos, Lunch Buffet, Chocolate Nirvana, Thai G... | Chinese, North Indian, Thai | 800 | Buffet | Banashankari |
| 2 | San Churro Cafe | Yes | No | 3.8 | 918 | Banashankari | Cafe, Casual Dining | Churros, Cannelloni, Minestrone Soup, Hot Choc... | Cafe, Mexican, Italian | 800 | Buffet | Banashankari |
| 3 | Addhuri Udupi Bhojana | No | No | 3.7 | 88 | Banashankari | Quick Bites | Masala Dosa | South Indian, North Indian | 300 | Buffet | Banashankari |
| 4 | Grand Village | No | No | 3.8 | 166 | Basavanagudi | Casual Dining | Panipuri, Gol Gappe | North Indian, Rajasthani | 600 | Buffet | Banashankari |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 51712 | Best Brews - Four Points by Sheraton Bengaluru... | No | No | 3.6 | 27 | Whitefield | Bar | 0 | Continental | 1,500 | Pubs and bars | Whitefield |
| 51713 | Vinod Bar And Restaurant | No | No | 0.0 | 0 | Whitefield | Bar | 0 | Finger Food | 600 | Pubs and bars | Whitefield |
| 51714 | Plunge - Sheraton Grand Bengaluru Whitefield H... | No | No | 0.0 | 0 | Whitefield | Bar | 0 | Finger Food | 2,000 | Pubs and bars | Whitefield |
| 51715 | Chime - Sheraton Grand Bengaluru Whitefield Ho... | No | Yes | 4.3 | 236 | ITPL Main Road, Whitefield | Bar | Cocktails, Pizza, Buttermilk | Finger Food | 2,500 | Pubs and bars | Whitefield |
| 51716 | The Nest - The Den Bengaluru | No | No | 3.4 | 13 | ITPL Main Road, Whitefield | Bar, Casual Dining | 0 | Finger Food, North Indian, Continental | 1,500 | Pubs and bars | Whitefield |
51717 rows × 12 columns
df.isnull().sum()
name 0 online_order 0 book_table 0 rate 0 votes 0 location 0 rest_type 0 dish_liked 0 cuisines 0 approx_cost(for two people) 0 listed_in(type) 0 listed_in(city) 0 dtype: int64
df['votes'].unique()
array([ 775, 787, 918, ..., 4957, 2382, 843], dtype=int64)
df['votes'].value_counts()
0 10027
4 1140
6 992
7 872
9 738
...
3673 1
1862 1
3909 1
2155 1
843 1
Name: votes, Length: 2328, dtype: int64
Votes contain purely integers, so no need of cleaning it.
There are two columns by the name location. One is 'location'other is 'listed_in(city)' so I am going to drop the one that has got least unique values
df['location'].unique()
array(['Banashankari', 'Basavanagudi', 'Mysore Road', 'Jayanagar',
'Kumaraswamy Layout', 'Rajarajeshwari Nagar', 'Vijay Nagar',
'Uttarahalli', 'JP Nagar', 'South Bangalore', 'City Market',
'Nagarbhavi', 'Bannerghatta Road', 'BTM', 'Kanakapura Road',
'Bommanahalli', 0, 'CV Raman Nagar', 'Electronic City', 'HSR',
'Marathahalli', 'Sarjapur Road', 'Wilson Garden', 'Shanti Nagar',
'Koramangala 5th Block', 'Koramangala 8th Block', 'Richmond Road',
'Koramangala 7th Block', 'Jalahalli', 'Koramangala 4th Block',
'Bellandur', 'Whitefield', 'East Bangalore', 'Old Airport Road',
'Indiranagar', 'Koramangala 1st Block', 'Frazer Town', 'RT Nagar',
'MG Road', 'Brigade Road', 'Lavelle Road', 'Church Street',
'Ulsoor', 'Residency Road', 'Shivajinagar', 'Infantry Road',
'St. Marks Road', 'Cunningham Road', 'Race Course Road',
'Commercial Street', 'Vasanth Nagar', 'HBR Layout', 'Domlur',
'Ejipura', 'Jeevan Bhima Nagar', 'Old Madras Road', 'Malleshwaram',
'Seshadripuram', 'Kammanahalli', 'Koramangala 6th Block',
'Majestic', 'Langford Town', 'Central Bangalore', 'Sanjay Nagar',
'Brookefield', 'ITPL Main Road, Whitefield',
'Varthur Main Road, Whitefield', 'KR Puram',
'Koramangala 2nd Block', 'Koramangala 3rd Block', 'Koramangala',
'Hosur Road', 'Rajajinagar', 'Banaswadi', 'North Bangalore',
'Nagawara', 'Hennur', 'Kalyan Nagar', 'New BEL Road', 'Jakkur',
'Rammurthy Nagar', 'Thippasandra', 'Kaggadasapura', 'Hebbal',
'Kengeri', 'Sankey Road', 'Sadashiv Nagar', 'Basaveshwara Nagar',
'Yeshwantpur', 'West Bangalore', 'Magadi Road', 'Yelahanka',
'Sahakara Nagar', 'Peenya'], dtype=object)
df['listed_in(city)'].unique()
array(['Banashankari', 'Bannerghatta Road', 'Basavanagudi', 'Bellandur',
'Brigade Road', 'Brookefield', 'BTM', 'Church Street',
'Electronic City', 'Frazer Town', 'HSR', 'Indiranagar',
'Jayanagar', 'JP Nagar', 'Kalyan Nagar', 'Kammanahalli',
'Koramangala 4th Block', 'Koramangala 5th Block',
'Koramangala 6th Block', 'Koramangala 7th Block', 'Lavelle Road',
'Malleshwaram', 'Marathahalli', 'MG Road', 'New BEL Road',
'Old Airport Road', 'Rajajinagar', 'Residency Road',
'Sarjapur Road', 'Whitefield'], dtype=object)
df=df.drop(['listed_in(city)'], axis=1)
location=df['location'].value_counts(ascending=False)
location
BTM 5124
HSR 2523
Koramangala 5th Block 2504
JP Nagar 2235
Whitefield 2144
...
West Bangalore 6
Yelahanka 6
Jakkur 3
Rajarajeshwari Nagar 2
Peenya 1
Name: location, Length: 94, dtype: int64
I am going to replace the least called locations under others category
loc_500=location[location<500]
def make_other_loc(value):
if(value in loc_500):
return 'others'
else:
return value
df['location']=df['location'].apply(make_other_loc)
df['location'].value_counts(ascending=False)
others 8141 BTM 5124 HSR 2523 Koramangala 5th Block 2504 JP Nagar 2235 Whitefield 2144 Indiranagar 2083 Jayanagar 1926 Marathahalli 1846 Bannerghatta Road 1630 Bellandur 1286 Electronic City 1258 Koramangala 1st Block 1238 Brigade Road 1218 Koramangala 7th Block 1181 Koramangala 6th Block 1156 Sarjapur Road 1065 Ulsoor 1023 Koramangala 4th Block 1017 MG Road 918 Banashankari 906 Kalyan Nagar 853 Richmond Road 812 Frazer Town 727 Malleshwaram 725 Basavanagudi 684 Residency Road 675 Banaswadi 664 Brookefield 658 New BEL Road 649 Kammanahalli 648 Rajajinagar 591 Church Street 569 Lavelle Road 529 Shanti Nagar 511 Name: location, dtype: int64
df['cuisines'].unique()
array(['North Indian, Mughlai, Chinese', 'Chinese, North Indian, Thai',
'Cafe, Mexican, Italian', ...,
'North Indian, Street Food, Biryani', 'Chinese, Mughlai',
'North Indian, Chinese, Arabian, Momos'], dtype=object)
cuisines=df['cuisines'].value_counts(ascending=False)
cuisines
North Indian 2913
North Indian, Chinese 2385
South Indian 1828
Biryani 918
Bakery, Desserts 911
...
North Indian, Chinese, South Indian, Juices 1
North Indian, Chinese, Kebab, Mughlai 1
Chinese, Vietnamese, Thai, Malaysian 1
Arabian, Lebanese, Chinese, Rolls 1
North Indian, Chinese, Arabian, Momos 1
Name: cuisines, Length: 2724, dtype: int64
cuisines_100=cuisines[cuisines<100]
cuisines_100
North Indian, Continental, Chinese 97
Juices 94
Fast Food, North Indian 93
Bengali, North Indian 93
Beverages, Juices 90
..
North Indian, Chinese, South Indian, Juices 1
North Indian, Chinese, Kebab, Mughlai 1
Chinese, Vietnamese, Thai, Malaysian 1
Arabian, Lebanese, Chinese, Rolls 1
North Indian, Chinese, Arabian, Momos 1
Name: cuisines, Length: 2655, dtype: int64
def make_other_cuisines(value):
if(value in cuisines_100):
return 'others'
else:
return value
df['cuisines']=df['cuisines'].apply(make_other_cuisines)
df['cuisines'].value_counts(ascending=False)
others 26505
North Indian 2913
North Indian, Chinese 2385
South Indian 1828
Biryani 918
...
South Indian, Chinese, North Indian 105
Italian, Pizza 105
North Indian, Mughlai, Chinese 104
South Indian, Fast Food 104
North Indian, Chinese, Seafood 102
Name: cuisines, Length: 70, dtype: int64
df['approx_cost(for two people)'].unique()
array(['800', '300', '600', '700', '550', '500', '450', '650', '400',
'900', '200', '750', '150', '850', '100', '1,200', '350', '250',
'950', '1,000', '1,500', '1,300', '199', '80', '1,100', '160',
'1,600', '230', '130', '50', '190', '1,700', 0, '1,400', '180',
'1,350', '2,200', '2,000', '1,800', '1,900', '330', '2,500',
'2,100', '3,000', '2,800', '3,400', '40', '1,250', '3,500',
'4,000', '2,400', '2,600', '120', '1,450', '469', '70', '3,200',
'60', '560', '240', '360', '6,000', '1,050', '2,300', '4,100',
'5,000', '3,700', '1,650', '2,700', '4,500', '140'], dtype=object)
I am going to remove the comas and make the values an integer
def make_integer(value):
value=str(value)
if ',' in value:
value= value.replace(',','')
return int(value)
else:
return int(value)
df['approx_cost(for two people)']=df['approx_cost(for two people)'].apply(make_integer)
df['approx_cost(for two people)'].unique()
array([ 800, 300, 600, 700, 550, 500, 450, 650, 400, 900, 200,
750, 150, 850, 100, 1200, 350, 250, 950, 1000, 1500, 1300,
199, 80, 1100, 160, 1600, 230, 130, 50, 190, 1700, 0,
1400, 180, 1350, 2200, 2000, 1800, 1900, 330, 2500, 2100, 3000,
2800, 3400, 40, 1250, 3500, 4000, 2400, 2600, 120, 1450, 469,
70, 3200, 60, 560, 240, 360, 6000, 1050, 2300, 4100, 5000,
3700, 1650, 2700, 4500, 140], dtype=int64)
df['rest_type'].unique()
array(['Casual Dining', 'Cafe, Casual Dining', 'Quick Bites',
'Casual Dining, Cafe', 'Cafe', 'Quick Bites, Cafe',
'Cafe, Quick Bites', 'Delivery', 'Mess', 'Dessert Parlor',
'Bakery, Dessert Parlor', 'Pub', 'Bakery', 'Takeaway, Delivery',
'Fine Dining', 'Beverage Shop', 'Sweet Shop', 'Bar',
'Beverage Shop, Quick Bites', 'Confectionery',
'Quick Bites, Beverage Shop', 'Dessert Parlor, Sweet Shop',
'Bakery, Quick Bites', 'Sweet Shop, Quick Bites', 'Kiosk',
'Food Truck', 'Quick Bites, Dessert Parlor',
'Beverage Shop, Dessert Parlor', 'Takeaway', 'Pub, Casual Dining',
'Casual Dining, Bar', 'Dessert Parlor, Beverage Shop',
'Quick Bites, Bakery', 'Dessert Parlor, Quick Bites',
'Microbrewery, Casual Dining', 'Lounge', 'Bar, Casual Dining',
'Food Court', 'Cafe, Bakery', 0, 'Dhaba',
'Quick Bites, Sweet Shop', 'Microbrewery',
'Food Court, Quick Bites', 'Pub, Bar', 'Casual Dining, Pub',
'Lounge, Bar', 'Food Court, Dessert Parlor',
'Casual Dining, Sweet Shop', 'Food Court, Casual Dining',
'Casual Dining, Microbrewery', 'Sweet Shop, Dessert Parlor',
'Bakery, Beverage Shop', 'Lounge, Casual Dining',
'Cafe, Food Court', 'Beverage Shop, Cafe', 'Cafe, Dessert Parlor',
'Dessert Parlor, Cafe', 'Dessert Parlor, Bakery',
'Microbrewery, Pub', 'Bakery, Food Court', 'Club',
'Quick Bites, Food Court', 'Bakery, Cafe', 'Bar, Cafe',
'Pub, Cafe', 'Casual Dining, Irani Cafee', 'Fine Dining, Lounge',
'Bar, Quick Bites', 'Bakery, Kiosk', 'Pub, Microbrewery',
'Microbrewery, Lounge', 'Fine Dining, Microbrewery',
'Fine Dining, Bar', 'Mess, Quick Bites', 'Dessert Parlor, Kiosk',
'Bhojanalya', 'Casual Dining, Quick Bites', 'Pop Up', 'Cafe, Bar',
'Casual Dining, Lounge', 'Bakery, Sweet Shop', 'Microbrewery, Bar',
'Cafe, Lounge', 'Bar, Pub', 'Lounge, Cafe', 'Club, Casual Dining',
'Quick Bites, Mess', 'Quick Bites, Meat Shop',
'Quick Bites, Kiosk', 'Lounge, Microbrewery',
'Food Court, Beverage Shop', 'Dessert Parlor, Food Court',
'Bar, Lounge'], dtype=object)
df['rest_type'].value_counts()
Quick Bites 19132
Casual Dining 10330
Cafe 3732
Delivery 2604
Dessert Parlor 2263
...
Dessert Parlor, Kiosk 2
Food Court, Beverage Shop 2
Dessert Parlor, Food Court 2
Sweet Shop, Dessert Parlor 1
Quick Bites, Kiosk 1
Name: rest_type, Length: 94, dtype: int64
rest_types=df['rest_type'].value_counts()
type_less_than_2000=rest_types[rest_types<2000]
type_less_than_2000
Casual Dining, Bar 1154
Bakery 1141
Beverage Shop 867
Bar 697
Food Court 624
...
Dessert Parlor, Kiosk 2
Food Court, Beverage Shop 2
Dessert Parlor, Food Court 2
Sweet Shop, Dessert Parlor 1
Quick Bites, Kiosk 1
Name: rest_type, Length: 88, dtype: int64
def make_other_type(value):
if(value in type_less_than_2000):
return 'others'
else:
return value
df['rest_type']=df['rest_type'].apply(make_other_type)
df['rest_type'].value_counts()
Quick Bites 19132 others 11619 Casual Dining 10330 Cafe 3732 Delivery 2604 Dessert Parlor 2263 Takeaway, Delivery 2037 Name: rest_type, dtype: int64
df['listed_in(type)'].unique()
array(['Buffet', 'Cafes', 'Delivery', 'Desserts', 'Dine-out',
'Drinks & nightlife', 'Pubs and bars'], dtype=object)
There are not many types in this category. So no need to clean the data
df['dish_liked'].unique()
array(['Pasta, Lunch Buffet, Masala Papad, Paneer Lajawab, Tomato Shorba, Dum Biryani, Sweet Corn Soup',
'Momos, Lunch Buffet, Chocolate Nirvana, Thai Green Curry, Paneer Tikka, Dum Biryani, Chicken Biryani',
'Churros, Cannelloni, Minestrone Soup, Hot Chocolate, Pink Sauce Pasta, Salsa, Veg Supreme Pizza',
...,
'Noodles, Chicken Noodle, Momos, American Chopsuey, Salad, Manchow Soup, Manchurian',
'Chicken Quesadilla, Naan, Breakfast Buffet, Cheesecake, Cocktails, Lunch Buffet, Biryani',
'Biryani, Andhra Meal'], dtype=object)
dish_liked=df['dish_liked'].value_counts(ascending=False)
dish_liked
0 28078
Biryani 182
Chicken Biryani 73
Friendly Staff 69
Waffles 68
...
Butter Chicken, Shawarma Roll, Chicken Shawarama, Chicken Grill, Rolls, Al Faham Chicken, Biryani 1
Filter Coffee, Sandwich, Bonda, Vada, Masala Dosa, Salad, Aloo Curry 1
Burgers, Fries, Jumbo Royale Burger, Salads, Peri Peri Chicken Salad, Potato Wedges, Rolls 1
Chaat, Pav Bhaji, Raj Kachori, Buttermilk, Ajwaini Paratha, Tawa Pulav, Sev Puri 1
Paratha, Dal Makhani, Lassi, Naan, Veg Thali, Chole, Kulcha 1
Name: dish_liked, Length: 5272, dtype: int64
Here The Outlier is the Null Value. 28078 of the values are null. Rest have liked what of dish they have liked. So I am going to make the null values as not participants
dish_0=dish_liked[dish_liked==28078]
dish_0
0 28078 Name: dish_liked, dtype: int64
def make_other_dish(value):
if(value in dish_0):
return 'not rated item'
else:
return value
df['dish_liked']=df['dish_liked'].apply(make_other_dish)
df['dish_liked'].value_counts()
not rated item 28078
Biryani 182
Chicken Biryani 73
Friendly Staff 69
Waffles 68
...
Butter Chicken, Shawarma Roll, Chicken Shawarama, Chicken Grill, Rolls, Al Faham Chicken, Biryani 1
Filter Coffee, Sandwich, Bonda, Vada, Masala Dosa, Salad, Aloo Curry 1
Burgers, Fries, Jumbo Royale Burger, Salads, Peri Peri Chicken Salad, Potato Wedges, Rolls 1
Chaat, Pav Bhaji, Raj Kachori, Buttermilk, Ajwaini Paratha, Tawa Pulav, Sev Puri 1
Paratha, Dal Makhani, Lassi, Naan, Veg Thali, Chole, Kulcha 1
Name: dish_liked, Length: 5272, dtype: int64
Now I am going to cluster the least liked item into others. That is if people have rated the dish less than 50
dish_50=dish_liked[dish_liked<=50]
dish_50
Coffee 42
Rooftop Ambience 42
Pizza 38
Burgers 33
Cocktails 29
..
Butter Chicken, Shawarma Roll, Chicken Shawarama, Chicken Grill, Rolls, Al Faham Chicken, Biryani 1
Filter Coffee, Sandwich, Bonda, Vada, Masala Dosa, Salad, Aloo Curry 1
Burgers, Fries, Jumbo Royale Burger, Salads, Peri Peri Chicken Salad, Potato Wedges, Rolls 1
Chaat, Pav Bhaji, Raj Kachori, Buttermilk, Ajwaini Paratha, Tawa Pulav, Sev Puri 1
Paratha, Dal Makhani, Lassi, Naan, Veg Thali, Chole, Kulcha 1
Name: dish_liked, Length: 5265, dtype: int64
def make_other_dish2(value):
if(value in dish_50):
return 'other dishes'
else:
return value
df['dish_liked']=df['dish_liked'].apply(make_other_dish2)
df['dish_liked'].value_counts()
not rated item 28078 other dishes 23134 Biryani 182 Chicken Biryani 73 Friendly Staff 69 Waffles 68 Paratha 57 Masala Dosa 56 Name: dish_liked, dtype: int64
df
| name | online_order | book_table | rate | votes | location | rest_type | dish_liked | cuisines | approx_cost(for two people) | listed_in(type) | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Jalsa | Yes | Yes | 4.1 | 775 | Banashankari | Casual Dining | other dishes | North Indian, Mughlai, Chinese | 800 | Buffet |
| 1 | Spice Elephant | Yes | No | 4.1 | 787 | Banashankari | Casual Dining | other dishes | others | 800 | Buffet |
| 2 | San Churro Cafe | Yes | No | 3.8 | 918 | Banashankari | others | other dishes | others | 800 | Buffet |
| 3 | Addhuri Udupi Bhojana | No | No | 3.7 | 88 | Banashankari | Quick Bites | Masala Dosa | South Indian, North Indian | 300 | Buffet |
| 4 | Grand Village | No | No | 3.8 | 166 | Basavanagudi | Casual Dining | other dishes | others | 600 | Buffet |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 51712 | Best Brews - Four Points by Sheraton Bengaluru... | No | No | 3.6 | 27 | Whitefield | others | not rated item | Continental | 1500 | Pubs and bars |
| 51713 | Vinod Bar And Restaurant | No | No | 0.0 | 0 | Whitefield | others | not rated item | Finger Food | 600 | Pubs and bars |
| 51714 | Plunge - Sheraton Grand Bengaluru Whitefield H... | No | No | 0.0 | 0 | Whitefield | others | not rated item | Finger Food | 2000 | Pubs and bars |
| 51715 | Chime - Sheraton Grand Bengaluru Whitefield Ho... | No | Yes | 4.3 | 236 | others | others | other dishes | Finger Food | 2500 | Pubs and bars |
| 51716 | The Nest - The Den Bengaluru | No | No | 3.4 | 13 | others | others | not rated item | others | 1500 | Pubs and bars |
51717 rows × 11 columns
After Data Cleaning, The Statistical Values like mean, standard deviation, quartiles can be found from the function describe()
df.describe()
| rate | votes | approx_cost(for two people) | |
|---|---|---|---|
| count | 51717.000000 | 51717.000000 | 51717.000000 |
| mean | 2.981209 | 283.697527 | 551.715587 |
| std | 1.516766 | 803.838853 | 439.717709 |
| min | 0.000000 | 0.000000 | 0.000000 |
| 25% | 3.000000 | 7.000000 | 300.000000 |
| 50% | 3.600000 | 41.000000 | 400.000000 |
| 75% | 3.900000 | 198.000000 | 650.000000 |
| max | 4.900000 | 16832.000000 | 6000.000000 |
I am going to check the number of number times the location that has repeated itself.
# Count plot for location
loc_others=['others']
df=df[df['location'].isin(loc_others)==False]
plt.figure(figsize=(20,20))
plt.title('Vizualtion of Location')
sns.countplot(x=df['location'])
plt.xticks(rotation=90);
Fig 1 is showing the number of restaturants area wise in Bengaluru.
plt.figure(figsize=(6,6))
plt.title('Vizualtion of Online Order facility')
sns.countplot(x=df['online_order'])
plt.show()
Fig 2. Shows the Online Order facility of restaurants as to How many restaurants provide online ordering facility. About 30000+ restaurants in Bengaluru provide online ording facility. About 20000+ restaurants do not provide online ordering facility.
plt.figure(figsize=(6,6))
fig = px.violin(df, y="book_table", title='Vizualtion of Table booking facility')
fig.show();
<Figure size 432x432 with 0 Axes>
Fig 3. Shows the reservation System. Well about 80% restaurants do not provide reseveration. 20% of them do provide reservation facility.
fig = px.pie(df, values=df['rest_type'].value_counts(), names=df['rest_type'].unique(), title='Types of the Restaurant')
fig.show()
Fig 4. shows the percentage of type of the restaurant. About 37% of the restaurants are casual dining centers, 20% are quick bites, 7.2% are cafes, 4.4% are Dessert Parlors.
Visualizing the dish_liked Column
d=df.groupby(['dish_liked'])['dish_liked'].count()
plt.title('Vizualtion of dishes liked')
d.plot(kind='bar',figsize=(10,8));
Fig 5. Shows the Type of the dish That is been liked by the customers. The rest of the items liked are Chiken Biryani, Massala Dosa, Paratha, waffles repsectively in the decreasing percentages. This data is insufficient to say which the best dish that people have liked as the percentage of the people who have not rated the dish item is more and there are small dish items in the list as well.
plt.figure(figsize=(16,16))
values=['others']
df= df[df['cuisines'].isin(values) == False]
sns.countplot(x=df['cuisines'])
plt.xticks(rotation=90)
plt.title('Vizualization of Cuisines')
plt.show()
Fig 6. shows the cuisines prepared in the restaunts. Restaurants preparing North Indian dish are popular. Next to that, we have North Indian Chinese, South Indian, Biryani spots, Bakery, Desserts are respectively popular in the decreasing order.
fig = px.pie(df, values=df['listed_in(type)'].value_counts(), names=df['listed_in(type)'].unique(), title='Categories of the Restaurant')
fig.show()
Fig 7: Shows the categories of restaurant. 50.2%(25942) restaurants are Buffet's. 34.4%(17779) are cafes. Fast food joints are off 6.95% (3593). Bakery and sweet parlours which come under Desserts are 3.33%(1723). Dine-outs are 2.13%(1101). Drinks and Nightlife restaurants are 1.71%(882). Bar and Pubs constitute 1.35%(697)
Zero_values=[0]
df= df[df['rate'].isin(Zero_values) == False]
plt.title('Visualization of rating')
sns.histplot(data=df, y="rate");
Fig 8: shows the distribution of rating versus the number of ratings. The Average rating about 4.0
sns.displot(data=df, x="approx_cost(for two people)");
Fig 9: shows the distribution of cost for two people dining versus the number of restaurants having the same cost.
I want to use this data to solve the following questions
sns.barplot(data=df, x="rate", y="votes")
plt.title('Relationship between Rating and Votes for each rating')
plt.xticks(rotation=90);
Fig 10: shows the distribution of ratings with respect to the number of people who have voted. Regardless of the restaurant, 5000+ people have voted 4.7 rating. The next rating is 4.9 voted by 4000+ people. There are bad ratings too.
plt.figure(figsize=(6,6))
plt.title('Relationship between rating and online ordering facility')
sns.boxplot(x='online_order',y='rate', data=df);
Fig 11: shows the online ordering facility versus rating. The average rating that is given for a restuarant which has online ordering facility is more than the restaurants which do not provide online ordering facility. Surprisingly the restaurants which do not provide online service is more than the ones that provide online servicing.
plt.figure(figsize=(6,6))
plt.title('Relationship between rating and online booking facility')
sns.boxplot(x='book_table',y='rate', data=df);
Fig 12. shows the online ordering table booking versus rating. The average rating that is given for a restuarant which has online table booking facility is more than the restaurants which do not provide online table booking facility.
Since there are many Location values, I am going to group them first and create a pivot table, later I will be ploting a bar chart
x=df.groupby(['location', 'online_order'])['name'].count()
x.to_csv('online_order_with_location.csv')
x=pd.read_csv('online_order_with_location.csv')
x=pd.pivot_table(x,values=None,index=['location'],columns=['online_order'],fill_value=0,aggfunc=np.sum)
x
| name | ||
|---|---|---|
| online_order | No | Yes |
| location | ||
| BTM | 597 | 1400 |
| Banashankari | 195 | 235 |
| Banaswadi | 112 | 140 |
| Bannerghatta Road | 217 | 411 |
| Basavanagudi | 105 | 227 |
| Bellandur | 195 | 337 |
| Brigade Road | 166 | 219 |
| Brookefield | 100 | 173 |
| Church Street | 85 | 119 |
| Electronic City | 189 | 203 |
| Frazer Town | 119 | 205 |
| HSR | 200 | 728 |
| Indiranagar | 234 | 524 |
| JP Nagar | 274 | 500 |
| Jayanagar | 225 | 560 |
| Kalyan Nagar | 102 | 184 |
| Kammanahalli | 85 | 106 |
| Koramangala 1st Block | 93 | 363 |
| Koramangala 4th Block | 122 | 208 |
| Koramangala 5th Block | 177 | 420 |
| Koramangala 6th Block | 134 | 278 |
| Koramangala 7th Block | 134 | 245 |
| Lavelle Road | 55 | 36 |
| MG Road | 149 | 152 |
| Malleshwaram | 153 | 188 |
| Marathahalli | 233 | 473 |
| New BEL Road | 84 | 136 |
| Rajajinagar | 121 | 146 |
| Residency Road | 188 | 82 |
| Richmond Road | 223 | 71 |
| Sarjapur Road | 97 | 290 |
| Shanti Nagar | 97 | 103 |
| Ulsoor | 113 | 223 |
| Whitefield | 336 | 451 |
x.plot(kind='bar',figsize=(20,8), title='Visualization of online ordering based on different locaton');
Fig 10. shows the distribution of online ordering restaurants location wise in Bengaluru. Excluding the restaurants which do allow online ordering and not ordering facility in minor areas are showing the highest counts. BTM Layout has more restaurants which are providing online ordering service. In Every location, the restaurants which provide online ordering are more than the offlines ones.
y=df.groupby(['location', 'book_table'])['name'].count()
y.to_csv('book_table_with_location.csv')
y=pd.read_csv('book_table_with_location.csv')
y=pd.pivot_table(y,values=None,index=['location'],columns=['book_table'],fill_value=0,aggfunc=np.sum)
y
| name | ||
|---|---|---|
| book_table | No | Yes |
| location | ||
| BTM | 1973 | 24 |
| Banashankari | 410 | 20 |
| Banaswadi | 252 | 0 |
| Bannerghatta Road | 610 | 18 |
| Basavanagudi | 326 | 6 |
| Bellandur | 508 | 24 |
| Brigade Road | 364 | 21 |
| Brookefield | 264 | 9 |
| Church Street | 168 | 36 |
| Electronic City | 379 | 13 |
| Frazer Town | 324 | 0 |
| HSR | 882 | 46 |
| Indiranagar | 666 | 92 |
| JP Nagar | 731 | 43 |
| Jayanagar | 739 | 46 |
| Kalyan Nagar | 273 | 13 |
| Kammanahalli | 187 | 4 |
| Koramangala 1st Block | 456 | 0 |
| Koramangala 4th Block | 282 | 48 |
| Koramangala 5th Block | 530 | 67 |
| Koramangala 6th Block | 379 | 33 |
| Koramangala 7th Block | 343 | 36 |
| Lavelle Road | 54 | 37 |
| MG Road | 257 | 44 |
| Malleshwaram | 304 | 37 |
| Marathahalli | 683 | 23 |
| New BEL Road | 207 | 13 |
| Rajajinagar | 260 | 7 |
| Residency Road | 232 | 38 |
| Richmond Road | 279 | 15 |
| Sarjapur Road | 361 | 26 |
| Shanti Nagar | 200 | 0 |
| Ulsoor | 301 | 35 |
| Whitefield | 748 | 39 |
y.plot(kind='bar',figsize=(20,8), title='Visualization of online table booking facility based on different locaton');
Fig 11. shows the table booking facility based on location. About 5000 restaurants in BTM Layout do not prvide online table booking facility. This is true even with the other restaurants in the other locations, though the number of restaurants vary
z=df.groupby(['location','approx_cost(for two people)'])['name'].count()
z.to_csv('cost_table_with_location.csv')
z=pd.read_csv('cost_table_with_location.csv')
z=pd.pivot_table(z,values=None,index=['location'],fill_value=0,aggfunc=np.mean)
z
| approx_cost(for two people) | name | |
|---|---|---|
| location | ||
| BTM | 573.333333 | 83.208333 |
| Banashankari | 568.421053 | 22.631579 |
| Banaswadi | 479.411765 | 14.823529 |
| Bannerghatta Road | 584.210526 | 33.052632 |
| Basavanagudi | 603.125000 | 20.750000 |
| Bellandur | 673.809524 | 25.333333 |
| Brigade Road | 573.888889 | 21.388889 |
| Brookefield | 460.000000 | 18.200000 |
| Church Street | 650.000000 | 13.600000 |
| Electronic City | 666.666667 | 18.666667 |
| Frazer Town | 468.750000 | 20.250000 |
| HSR | 484.000000 | 46.400000 |
| Indiranagar | 643.333333 | 31.583333 |
| JP Nagar | 607.142857 | 36.857143 |
| Jayanagar | 635.714286 | 37.380952 |
| Kalyan Nagar | 513.888889 | 15.888889 |
| Kammanahalli | 466.666667 | 12.733333 |
| Koramangala 1st Block | 496.875000 | 28.500000 |
| Koramangala 4th Block | 743.333333 | 22.000000 |
| Koramangala 5th Block | 597.368421 | 31.421053 |
| Koramangala 6th Block | 600.000000 | 24.235294 |
| Koramangala 7th Block | 558.333333 | 21.055556 |
| Lavelle Road | 840.000000 | 9.100000 |
| MG Road | 1105.882353 | 17.705882 |
| Malleshwaram | 667.619048 | 16.238095 |
| Marathahalli | 634.782609 | 30.695652 |
| New BEL Road | 666.666667 | 10.476190 |
| Rajajinagar | 527.777778 | 14.833333 |
| Residency Road | 1072.727273 | 12.272727 |
| Richmond Road | 947.222222 | 16.333333 |
| Sarjapur Road | 618.181818 | 17.590909 |
| Shanti Nagar | 395.000000 | 20.000000 |
| Ulsoor | 753.333333 | 22.400000 |
| Whitefield | 935.714286 | 28.107143 |
z.plot(kind='bar',figsize=(20,8),title='Visualization of approximate cost for 2 people based on different locaton');
Fig 12: shows the distribution of approximate cost for two people people dining in all the restaurants in the respective area. Lavelle Road restaurants are much costlier than any other areas, also Recidency road, Richmond Road, Ulsoor, church street, Indiranagr are costlier. Other area are comparetively higher.
p=df.groupby(['location', 'rest_type','listed_in(type)']).count()
p.to_csv('Rest_table_with_location.csv')
p=pd.read_csv('Rest_table_with_location.csv')
p=pd.pivot_table(p,values=None,index=['location'],columns=['rest_type','listed_in(type)'],fill_value=0,aggfunc=np.sum)
p
| approx_cost(for two people) | ... | votes | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rest_type | Cafe | Casual Dining | Delivery | ... | Quick Bites | Takeaway, Delivery | others | ||||||||||||||
| listed_in(type) | Cafes | Delivery | Desserts | Dine-out | Buffet | Cafes | Delivery | Desserts | Dine-out | Delivery | ... | Dine-out | Delivery | Desserts | Buffet | Cafes | Delivery | Desserts | Dine-out | Drinks & nightlife | Pubs and bars |
| location | |||||||||||||||||||||
| BTM | 19 | 34 | 0 | 22 | 2 | 0 | 140 | 0 | 82 | 92 | ... | 469 | 129 | 3 | 5 | 8 | 155 | 41 | 35 | 13 | 9 |
| Banashankari | 9 | 8 | 1 | 7 | 3 | 0 | 32 | 0 | 34 | 12 | ... | 117 | 2 | 1 | 0 | 0 | 22 | 20 | 17 | 11 | 0 |
| Banaswadi | 12 | 8 | 2 | 10 | 0 | 0 | 17 | 0 | 17 | 0 | ... | 58 | 10 | 4 | 0 | 0 | 21 | 13 | 8 | 2 | 0 |
| Bannerghatta Road | 10 | 10 | 1 | 9 | 0 | 0 | 80 | 0 | 56 | 31 | ... | 130 | 19 | 5 | 0 | 3 | 36 | 21 | 29 | 1 | 0 |
| Basavanagudi | 3 | 9 | 0 | 3 | 2 | 0 | 18 | 1 | 21 | 0 | ... | 96 | 3 | 0 | 0 | 0 | 16 | 16 | 3 | 3 | 0 |
| Bellandur | 11 | 0 | 0 | 11 | 4 | 0 | 43 | 0 | 46 | 18 | ... | 112 | 10 | 0 | 6 | 1 | 45 | 26 | 48 | 6 | 6 |
| Brigade Road | 14 | 9 | 3 | 9 | 0 | 0 | 38 | 0 | 36 | 0 | ... | 41 | 0 | 0 | 0 | 0 | 29 | 22 | 31 | 19 | 6 |
| Brookefield | 3 | 0 | 0 | 3 | 0 | 0 | 20 | 0 | 21 | 2 | ... | 61 | 3 | 0 | 2 | 0 | 30 | 13 | 15 | 0 | 0 |
| Church Street | 11 | 0 | 0 | 10 | 0 | 0 | 25 | 0 | 33 | 0 | ... | 25 | 0 | 0 | 5 | 0 | 8 | 5 | 5 | 1 | 2 |
| Electronic City | 3 | 2 | 0 | 2 | 2 | 0 | 34 | 1 | 45 | 8 | ... | 101 | 11 | 0 | 0 | 1 | 28 | 14 | 20 | 7 | 5 |
| Frazer Town | 7 | 14 | 0 | 6 | 0 | 0 | 32 | 0 | 16 | 10 | ... | 41 | 10 | 3 | 0 | 1 | 24 | 28 | 10 | 0 | 0 |
| HSR | 9 | 8 | 1 | 9 | 3 | 0 | 101 | 0 | 50 | 71 | ... | 127 | 62 | 5 | 0 | 0 | 82 | 28 | 14 | 0 | 3 |
| Indiranagar | 23 | 25 | 8 | 17 | 14 | 0 | 99 | 2 | 56 | 58 | ... | 88 | 18 | 4 | 1 | 5 | 63 | 26 | 9 | 5 | 7 |
| JP Nagar | 12 | 3 | 2 | 12 | 17 | 0 | 108 | 0 | 100 | 42 | ... | 119 | 34 | 2 | 0 | 3 | 57 | 26 | 18 | 6 | 0 |
| Jayanagar | 14 | 10 | 6 | 12 | 8 | 0 | 102 | 0 | 59 | 2 | ... | 153 | 6 | 0 | 0 | 2 | 55 | 30 | 14 | 0 | 0 |
| Kalyan Nagar | 9 | 5 | 0 | 8 | 1 | 2 | 37 | 2 | 39 | 15 | ... | 58 | 2 | 0 | 0 | 0 | 12 | 10 | 4 | 0 | 0 |
| Kammanahalli | 7 | 3 | 2 | 4 | 0 | 0 | 17 | 0 | 18 | 6 | ... | 44 | 5 | 2 | 0 | 2 | 10 | 2 | 6 | 4 | 0 |
| Koramangala 1st Block | 9 | 13 | 0 | 12 | 0 | 0 | 28 | 0 | 18 | 38 | ... | 106 | 24 | 0 | 0 | 0 | 33 | 16 | 12 | 3 | 4 |
| Koramangala 4th Block | 18 | 11 | 0 | 18 | 9 | 0 | 26 | 0 | 29 | 44 | ... | 24 | 10 | 4 | 0 | 0 | 19 | 16 | 15 | 15 | 9 |
| Koramangala 5th Block | 23 | 16 | 0 | 19 | 10 | 0 | 60 | 0 | 68 | 11 | ... | 60 | 10 | 4 | 0 | 0 | 60 | 25 | 29 | 0 | 3 |
| Koramangala 6th Block | 11 | 10 | 0 | 7 | 9 | 0 | 46 | 0 | 39 | 11 | ... | 76 | 22 | 4 | 0 | 0 | 18 | 9 | 24 | 8 | 3 |
| Koramangala 7th Block | 17 | 6 | 0 | 11 | 0 | 0 | 19 | 0 | 32 | 55 | ... | 58 | 5 | 0 | 5 | 0 | 27 | 29 | 15 | 0 | 3 |
| Lavelle Road | 2 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 5 | 0 | 21 | 14 | 15 | 6 | 6 |
| MG Road | 22 | 12 | 0 | 18 | 10 | 0 | 10 | 0 | 11 | 0 | ... | 43 | 0 | 0 | 0 | 10 | 32 | 9 | 39 | 11 | 4 |
| Malleshwaram | 15 | 3 | 2 | 9 | 7 | 0 | 15 | 0 | 22 | 0 | ... | 73 | 2 | 2 | 2 | 0 | 37 | 30 | 23 | 2 | 2 |
| Marathahalli | 5 | 4 | 1 | 3 | 3 | 0 | 77 | 1 | 64 | 34 | ... | 141 | 28 | 2 | 6 | 0 | 57 | 27 | 25 | 6 | 2 |
| New BEL Road | 3 | 7 | 0 | 3 | 2 | 0 | 22 | 0 | 19 | 7 | ... | 46 | 4 | 0 | 0 | 1 | 14 | 6 | 8 | 2 | 2 |
| Rajajinagar | 3 | 0 | 2 | 0 | 3 | 0 | 16 | 0 | 25 | 3 | ... | 87 | 7 | 2 | 0 | 0 | 7 | 10 | 4 | 1 | 1 |
| Residency Road | 6 | 5 | 0 | 6 | 5 | 0 | 15 | 0 | 28 | 3 | ... | 31 | 2 | 0 | 0 | 0 | 0 | 0 | 48 | 30 | 18 |
| Richmond Road | 10 | 1 | 0 | 10 | 9 | 0 | 30 | 0 | 45 | 5 | ... | 53 | 5 | 0 | 5 | 0 | 10 | 20 | 30 | 6 | 6 |
| Sarjapur Road | 2 | 5 | 5 | 1 | 5 | 0 | 32 | 0 | 32 | 39 | ... | 64 | 12 | 3 | 0 | 0 | 34 | 21 | 8 | 1 | 1 |
| Shanti Nagar | 9 | 0 | 0 | 5 | 0 | 0 | 30 | 3 | 18 | 4 | ... | 54 | 0 | 0 | 0 | 0 | 15 | 11 | 5 | 0 | 0 |
| Ulsoor | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 0 | 11 | 7 | ... | 84 | 0 | 0 | 0 | 0 | 29 | 13 | 15 | 8 | 4 |
| Whitefield | 12 | 5 | 3 | 10 | 6 | 0 | 79 | 0 | 87 | 49 | ... | 136 | 25 | 2 | 1 | 0 | 72 | 29 | 57 | 16 | 5 |
34 rows × 248 columns
p.plot(kind='bar',figsize=(20,8),title='Visualization of type of restaurant based on different locaton and cost');
Fig 13: shows the type of the restaurant and their categories. The Delivery Type of restaraunts are more in number than a buffet, pubs and bars.
q=df.groupby(['location', 'cuisines']).count()
q.to_csv('cuisines_table_with_location.csv')
q=pd.read_csv('cuisines_table_with_location.csv')
q=pd.pivot_table(q,values=None,index=['location'],columns=['cuisines'],fill_value=0,aggfunc=np.sum)
q
| approx_cost(for two people) | ... | votes | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| cuisines | Andhra | Andhra, Biryani | Arabian | Bakery | Bakery, Desserts | Bakery, Fast Food | Beverages | Beverages, Desserts | Beverages, Fast Food | Biryani | ... | Pizza, Fast Food | South Indian | South Indian, Biryani | South Indian, Chinese | South Indian, Chinese, North Indian | South Indian, Fast Food | South Indian, North Indian | South Indian, North Indian, Chinese | South Indian, North Indian, Chinese, Street Food | Street Food |
| location | |||||||||||||||||||||
| BTM | 10 | 16 | 31 | 30 | 24 | 0 | 60 | 2 | 32 | 77 | ... | 3 | 59 | 12 | 16 | 10 | 0 | 11 | 67 | 9 | 22 |
| Banashankari | 2 | 0 | 2 | 5 | 21 | 0 | 0 | 0 | 0 | 12 | ... | 5 | 55 | 4 | 2 | 6 | 5 | 14 | 18 | 16 | 5 |
| Banaswadi | 0 | 0 | 2 | 4 | 8 | 1 | 2 | 0 | 0 | 15 | ... | 1 | 27 | 7 | 2 | 0 | 0 | 8 | 8 | 3 | 5 |
| Bannerghatta Road | 7 | 11 | 2 | 14 | 15 | 2 | 2 | 0 | 4 | 9 | ... | 3 | 20 | 3 | 2 | 0 | 0 | 21 | 27 | 3 | 17 |
| Basavanagudi | 0 | 0 | 0 | 13 | 9 | 0 | 0 | 8 | 6 | 6 | ... | 2 | 84 | 3 | 3 | 0 | 5 | 0 | 29 | 3 | 15 |
| Bellandur | 6 | 2 | 4 | 6 | 20 | 1 | 0 | 4 | 13 | 20 | ... | 4 | 20 | 4 | 4 | 0 | 0 | 4 | 13 | 8 | 11 |
| Brigade Road | 0 | 0 | 0 | 22 | 0 | 4 | 0 | 0 | 7 | 0 | ... | 3 | 12 | 3 | 20 | 0 | 0 | 13 | 22 | 0 | 0 |
| Brookefield | 0 | 0 | 2 | 8 | 5 | 0 | 7 | 6 | 12 | 10 | ... | 5 | 24 | 5 | 0 | 0 | 0 | 4 | 18 | 0 | 7 |
| Church Street | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | ... | 1 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 |
| Electronic City | 7 | 10 | 2 | 6 | 13 | 6 | 1 | 3 | 3 | 5 | ... | 5 | 21 | 0 | 2 | 0 | 2 | 8 | 21 | 2 | 4 |
| Frazer Town | 0 | 0 | 0 | 16 | 23 | 10 | 0 | 0 | 4 | 9 | ... | 4 | 37 | 7 | 7 | 0 | 6 | 0 | 8 | 0 | 6 |
| HSR | 10 | 2 | 8 | 10 | 33 | 5 | 11 | 0 | 8 | 27 | ... | 5 | 56 | 9 | 0 | 0 | 12 | 10 | 1 | 25 | 12 |
| Indiranagar | 14 | 2 | 0 | 17 | 18 | 0 | 3 | 0 | 6 | 33 | ... | 4 | 41 | 6 | 1 | 1 | 7 | 3 | 18 | 2 | 8 |
| JP Nagar | 4 | 8 | 2 | 10 | 50 | 2 | 0 | 0 | 10 | 11 | ... | 2 | 46 | 9 | 2 | 1 | 0 | 7 | 42 | 16 | 7 |
| Jayanagar | 8 | 4 | 2 | 6 | 11 | 11 | 13 | 13 | 11 | 19 | ... | 2 | 86 | 4 | 10 | 4 | 7 | 22 | 49 | 14 | 9 |
| Kalyan Nagar | 4 | 0 | 4 | 4 | 18 | 3 | 2 | 0 | 6 | 14 | ... | 6 | 11 | 0 | 0 | 0 | 0 | 2 | 4 | 0 | 5 |
| Kammanahalli | 0 | 0 | 13 | 0 | 4 | 0 | 0 | 0 | 0 | 6 | ... | 0 | 16 | 5 | 0 | 0 | 0 | 0 | 0 | 5 | 2 |
| Koramangala 1st Block | 0 | 0 | 0 | 12 | 29 | 1 | 0 | 0 | 7 | 15 | ... | 0 | 3 | 0 | 6 | 0 | 0 | 9 | 0 | 0 | 0 |
| Koramangala 4th Block | 0 | 0 | 0 | 0 | 22 | 0 | 3 | 0 | 0 | 0 | ... | 3 | 4 | 0 | 0 | 0 | 0 | 13 | 0 | 5 | 0 |
| Koramangala 5th Block | 15 | 0 | 6 | 0 | 28 | 1 | 20 | 20 | 6 | 13 | ... | 1 | 15 | 0 | 0 | 0 | 0 | 0 | 15 | 9 | 3 |
| Koramangala 6th Block | 0 | 9 | 0 | 5 | 13 | 1 | 0 | 0 | 0 | 13 | ... | 9 | 41 | 0 | 0 | 0 | 0 | 4 | 9 | 0 | 0 |
| Koramangala 7th Block | 0 | 0 | 0 | 10 | 11 | 11 | 5 | 0 | 0 | 26 | ... | 2 | 11 | 10 | 5 | 0 | 0 | 0 | 10 | 0 | 5 |
| Lavelle Road | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 6 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MG Road | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 11 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 3 |
| Malleshwaram | 0 | 4 | 0 | 9 | 8 | 2 | 2 | 2 | 0 | 2 | ... | 9 | 54 | 4 | 0 | 0 | 7 | 3 | 12 | 1 | 5 |
| Marathahalli | 10 | 5 | 0 | 12 | 24 | 4 | 6 | 6 | 6 | 7 | ... | 4 | 28 | 4 | 9 | 0 | 0 | 5 | 11 | 0 | 10 |
| New BEL Road | 0 | 0 | 2 | 3 | 6 | 2 | 0 | 0 | 2 | 5 | ... | 4 | 12 | 2 | 0 | 1 | 0 | 0 | 6 | 0 | 4 |
| Rajajinagar | 0 | 0 | 0 | 6 | 10 | 1 | 0 | 0 | 0 | 12 | ... | 2 | 41 | 13 | 8 | 2 | 2 | 2 | 7 | 8 | 8 |
| Residency Road | 0 | 10 | 0 | 10 | 5 | 0 | 15 | 0 | 2 | 14 | ... | 0 | 0 | 10 | 10 | 0 | 0 | 0 | 0 | 0 | 4 |
| Richmond Road | 0 | 0 | 0 | 8 | 0 | 22 | 0 | 0 | 0 | 5 | ... | 0 | 19 | 0 | 0 | 0 | 0 | 7 | 6 | 0 | 0 |
| Sarjapur Road | 4 | 3 | 2 | 13 | 30 | 4 | 0 | 0 | 3 | 16 | ... | 7 | 18 | 4 | 2 | 0 | 0 | 0 | 15 | 1 | 2 |
| Shanti Nagar | 0 | 0 | 0 | 10 | 13 | 0 | 0 | 4 | 12 | 12 | ... | 6 | 29 | 0 | 8 | 0 | 0 | 0 | 5 | 0 | 0 |
| Ulsoor | 0 | 0 | 17 | 5 | 27 | 6 | 0 | 0 | 10 | 9 | ... | 0 | 19 | 0 | 11 | 0 | 0 | 0 | 0 | 0 | 11 |
| Whitefield | 4 | 6 | 4 | 10 | 18 | 6 | 3 | 1 | 3 | 29 | ... | 2 | 35 | 1 | 1 | 5 | 1 | 3 | 18 | 4 | 5 |
34 rows × 621 columns
q.plot(kind='hist',figsize=(20,8),title='Visualization of restaurant categories based on different locaton');
Fig 14: Shows the variation different cuisines
votes
The cleaned data contained the following statistical data
rate votes approx_cost(for two people)
count 51717.000000 51717.000000 51717.000000 mean 2.981209 283.697527 551.715587 std 1.516766 803.838853 439.717709 min 0.000000 0.000000 0.000000 25% 3.000000 7.000000 300.000000 50% 3.600000 41.000000 400.000000 75% 3.900000 198.000000 650.000000 max 4.900000 16832.000000 6000.000000
Fig 2. Shows the Online Order facility of restaurants as to How many restaurants provide online ordering facility. About 30000+ restaurants in Bengaluru provide online ording facility. About 20000+ restaurants do not provide online ordering facility.
The dataset collected from Zomato can be classified as an NLP problem. The text in the dataset can be further used for sentiment analysis, recommendation system. Based on the above analysis, Biryani, North Indian, South Indian cuisines are most famous BTM Layout is one of the hotspot for dining. Online ordering facility has helped the restaurants to get a higher rating than those do not provide online servicing. As well as restaurants giving Reservation facility via online and offiline have gained more average rating than those do not provide the reservation facility. Fast Food chains and Delivery chains are most popular and have gained higher rating as well.
All in all a if at all one is going to open a restaurant in Bengaluru, online ordering, reservation facilities are important to have a higher rating and to maintain good customers, busy area like, BTM Layout, Whitefield, Indiranagar are going to be pretty hard to servive as the competition is more.
www.kaggle.com